EVOC 20 TrackOscillator U/V detection parameters

Human speech consists of a series of voiced sounds—tonal sounds or formants—and unvoiced sounds. The main distinction between voiced and unvoiced sounds is that voiced sounds are produced by an oscillation of the vocal cords, whereas unvoiced sounds are produced by blocking and restricting the air flow with lips, tongue, palate, throat, and larynx.

If speech containing voiced and unvoiced sounds is used as a vocoder analysis signal but the synthesis engine doesn’t differentiate between voiced and unvoiced sounds, the result sounds rather weak. To avoid this problem, the synthesis section of the vocoder must produce different sounds for the voiced and unvoiced parts of the signal.

EVOC 20 TrackOscillator includes an Unvoiced/Voiced detector. This unit detects the unvoiced portions of the sound in the analysis signal and then substitutes the corresponding portions in the synthesis signal with noise, with a mixture of noise and synthesizer signal, or with the original signal. If the U/V Detector detects voiced parts, it passes this information to the synthesis section, which uses the normal synthesis signal for these portions.

A short introduction to formants

A formant is a peak in the frequency spectrum of a sound. In the context of human voices, formants are the key component that enables humans to distinguish between different vowel sounds—based purely on the frequency of the sounds. Formants in human speech and singing are produced by the vocal tract, with most vowel sounds containing four or more formants.

Figure. U/V Detection parameters.

U/V detection parameters